A Statistical Correction-Rejection Strategy for OCR Outputs in Persian Personal Information Forms

نویسندگان

R. Mehran

A. Shali

F. Razzazi

چکیده

-In this paper, a MAP statistical modeling *approach has been utilized to correct and verify Persian names and surname OCR outputs. In addition, an efficient Neural Network based rejection method has been presented and tested. Due to large variety of Persian surnames, a statistical grammar has been added to the MAP strategy, to make new surnames, which are not included in the dictionary. The model has been analytically formulated and practically implemented. The achieved results show a large character and word error reduction while the calculation increase is negligible in comparison with character recognition complexity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Retrieving Arabic Printed Document: a Survey

This paper surveys some of the literature pertaining to searching and retrieving OCR’ed printed documents with emphasis on Arabic documents. It examines peculiarities of Arabic morphology, orthography, retrieval, word clustering, display, OCR, and error correction. The paper surveys existing evaluation test-beds for retrieval of Arabic OCR texts. Lastly, it concludes with possible directions fo...

متن کامل

Linguistic Error Correction Of Japanese Sentences

This paper describes a newly developed linguistic error correction system, which can correct errors and rejections of Japanese sentences by using linguistic knowledge. Conventional optical character readers (OCR) need human assistance to correct their recognition errors and rejections. An operator must teach the OCR correct answers whenever an illegible character pattern occurs. If this error c...

متن کامل

Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model

We present a novel OCR error correction method for languages without word delimiters that have a large character set, such as Japanese and Chinese. It consists of a statistical OCR model, an approximate word matching method using character shape similarity, and a word segmentation algorithm using a statistical language model. By using a statistical OCR model and character shape similarity, the ...

متن کامل

Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model

In this paper we present a novel approach to the automatic correction of OCR-induced orthographic errors in a given text. While current systems depend heavily on large training corpora or external information, such as domain-specific lexicons or confidence scores from the OCR process, our system only requires a small amount of relatively clean training data from a representative corpus to learn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

A Statistical Correction-Rejection Strategy for OCR Outputs in Persian Personal Information Forms

نویسندگان

چکیده

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Retrieving Arabic Printed Document: a Survey

Linguistic Error Correction Of Japanese Sentences

Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model

Generating a Training Corpus for OCR Post-Correction Using Encoder-Decoder Model

عنوان ژورنال:

اشتراک گذاری